Mining Tandem Mass Spectral Data to Develop a More Accurate Mass Error Model for Peptide Identification

نویسندگان

  • Yan Fu
  • Wen Gao
  • Simin He
  • Ruixiang Sun
  • Hu Zhou
  • Rong Zeng
چکیده

The assumption on the mass error distribution of fragment ions plays a crucial role in peptide identification by tandem mass spectra. Previous mass error models are the simplistic uniform or normal distribution with empirically set parameter values. In this paper, we propose a more accurate mass error model, namely conditional normal model, and an iterative parameter learning algorithm. The new model is based on two important observations on the mass error distribution, i.e. the linearity between the mean of mass error and the ion mass, and the log-log linearity between the standard deviation of mass error and the peak intensity. To our knowledge, the latter quantitative relationship has never been reported before. Experimental results demonstrate the effectiveness of our approach in accurately quantifying the mass error distribution and the ability of the new model to improve the accuracy of peptide identification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development and Evaluation of Methods for Predicting Protein Levels and Peak Intensities from Tandem Mass Spectrometry Data

Tandem mass spectrometry (MS/MS) of peptides is a central technology for proteomics, enabling the identification of thousands of proteins and peptides from a complex mixture. With the increasing acquisition rate of tandem mass spectrometers, it has become possible to use data-mining techniques to attempt to solve important biological problems using MS/MS data. These problems include (i) estimat...

متن کامل

Data Mining in Protein Identification by Tandem Mass Spectrometry

Protein identification (sequencing) by tandem mass spectrometry is a fundamental technique for proteomics which studies structures and functions of proteins in large scale and acts as a complement to genomics. Analysis and interpretation of vast amounts of spectral data generated in proteomics experiments present unprecedented challenges and opportunities for data mining in areas such as data p...

متن کامل

Spectral profiles, a novel representation of tandem mass spectra and their applications for de novo peptide sequencing and identification.

Despite many efforts in the last decade, the progress in de novo peptide sequencing has been slow with only 30-45% of all peptides correctly reconstructed. We argue that accurate full-length peptide sequencing may be an unattainable goal for some spectra and demonstrate how to accurately sequence gapped peptides instead. We further argue that gapped peptides are nearly as useful as full-length ...

متن کامل

PepHMM: A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search

An accurate scoring function for database search is crucial for peptide identification using tandem mass spectrometry. Although many mathematical models have been proposed to score peptides against tandem mass spectra, our method (called PepHMM, http://msms.cmb.usc.edu) is unique in that it combines information on machine accuracy, mass peak intensity, and correlation among ions into a hidden M...

متن کامل

A Hidden Markov Model Based Scoring Function for Mass Spectrometry Database Search

An accurate scoring function for database search is crucial for peptide identification using tandem mass spectrometry. Although many mathematical models have been proposed to score peptides against tandem mass spectra, our method (called PepHMM, http://msms.cmb.usc.edu) is unique in that it combines information on machine accuracy, mass peak intensity, and correlation among ions into a hidden M...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2007